2 November 2016
Given data, \(y\), we want to fit a model with parameters \(\theta\)
Bayes theorem allows us to do this
\[p(\theta | y) = \frac{p(\theta)p(y | \theta)}{p(y)} \]
\(p(\theta | y)\) is the posterior distribution, \(p(\theta)\) is the prior distribution, \(p(y|\theta\)) the likelihood and \(p(y) = \int_\theta p(y | \theta) p(\theta) \textrm{d}\theta\)
Once parameters have been determined, the parametric model can be used to answer questions about the process which generated the data, \(y\)
\[ Y(t_{0:N}) = \{y(t_0), y(t_1), \dots , y(t_N) \} \]
\[y(t_0), y(t_1), \dots \]
We want to forecast future observations, which have not yet been observed
We do this by fitting a parametric model which describes the process
Streaming data is often irregularly observed, consider first a completely observed discrete sensor for a lightbulb
The lightbulb can be considered either on, or off, here is a regularly observed series of the lightbulb
Recording the lightbulb data irregularly can increase accuracy, and reduce storage costs
In the regularly observed data there are 288 observations a day, in the irregular case there is one observation
With irregularly observed data, we can accurately answer questions like: "at 12pm on Tuesday, how many lights were on in The Core?"
\[\begin{align*} y(t_i)|\eta(t_i) &\sim \pi(y(t_i) | \eta(t_i)), \\ \eta(t_i)|\textbf{x}(t_i) &= g(F_{t_i}^T \textbf{x}(t_i)), \\ \textbf{X}(t_i) | \textbf{x}(t_{i-1}) &\sim p(\textbf{x}(t_i) | \textbf{x}(t_{i-1})) \end{align*}\]
\[\begin{align*} N(t) &\sim \textrm{Poisson}(N(t) | \lambda(t)) \\ \lambda(t_i) &= \exp\{ x(t) \} \\ \textrm{d}X(t) &= \mu \textrm{d}t + \sigma \textrm{d}W(t) \end{align*}\]
\[ \textbf{X}(t_i) | \textbf{x}(t_{i-1}) \sim \begin{pmatrix} p_1(\textbf{x}^{(1)}(t_i) | \textbf{x}^{(1)}(t_{i-1})) \\ p_2(\textbf{x}^{(2)}(t_i) | \textbf{x}^{(2)}(t_{i-1})) \end{pmatrix} \]
\[\begin{align*} y(t) &\sim \mathcal{N}(y(t) | \mu(t), \sigma) \\ \mu(t) &= F_t^T \textbf{x}(t) \\ \textrm{d}\textbf{X}(t) &= \alpha(\theta - \textbf{x}(t)) \textrm{d}t + \Sigma \textrm{d}W(t) \end{align*}\]
\[F_t = \begin{pmatrix} \cos(\omega t) \\ \sin(\omega t) \\ \cos(2\omega t) \\ \sin(2\omega t) \\ \cos(3\omega t) \\ \sin(3\omega t) \end{pmatrix}\]
\[\begin{align*} N(t) &\sim \textrm{Poisson}(N(t) | \lambda(t)) \\ \lambda(t_i) &= \exp\{ x(t) \} \\ \textrm{d}X(t) &= \mu \textrm{d}t + \sigma \textrm{d}W(t) \end{align*}\]
\[\begin{align*} N(t) &\sim \textrm{Poisson}(N(t) | \lambda(t)) \\ \lambda(t_i) &= \exp\{ F_t^T x(t) \} \\ \textbf{X}(t_i)|\textbf{x}(t_{i-1}) &\sim \begin{pmatrix} p_1(\textbf{x}^{(1)}(t_i) | \textbf{x}^{(1)}(t_{i-1})) \\ p_2(\textbf{x}^{(2)}(t_i) | \textbf{x}^{(2)}(t_{i-1})) \end{pmatrix} \end{align*}\]
Streams can be thought of as an infinite listfromscala> val naturalNumbers = Stream.from(1)
Stream(1, ?)
Akka Streams: A Scala library for stream processing
Akka streams have three main abstractions:
A Source is a definition of a stream, it can be a "pure" stream, or a database or webservice call
A Flow is a processing stage, which can be used to transform a Source to another Source
A Sink defines what happens at the end of the stream, usually an effect such as writing to a file
foldLeft (or foldRight) can be used (with a seed) to reduce a foldable data structure by recursively applying a binary operationscala> Stream.from(1). take(10). foldLeft(0)(_ + _)
55
foldLeft will reduce from the left, ((1 + 2) + 3) + 4
foldRight will reduce from the right, 1 + (2 + (3 + 4))
reduce is equivalent to foldLeft for associative and commutative binary operations and can be applied in parallel
scanLeft can be used to accumulate a running sum:scala scala> Stream.from(1). take(10). scanLeft(0)(_ + _). foreach(n => print(s"$n, ")) 0, 1, 3, 6, 10, 15, 21, 28, 36, 45, 55
\[p(x(t_n) | x(t_{n-1}), \dots x(t_0)) = p(x(t_n) | x(t_{n-1}))\]
fold, unfold, to simulate a random walk:```scala val x0 = Gaussian(0.0, 1.0).draw
Source.unfold(x0)(a => Some((Gaussian(a, sigma).draw, a))) ```
In order to perform the bootstrap particle filter, we need to know the time of the previous observation and the previous particle cloud
First define the observation datatype and the state of the bootstrap particle filter
case class Data(time: Datetime, observation: Double) case class FilterState(t0: Datetime, state: List[State])
scala val filterStep: (Data, FilterState) => FilterStateThe the filter can by ran using an akka Flow and the scan operation, the initial state init contains the particle cloud and the time at the start of the application of the filter
def filter(init: FilterState) = Flow[Data].scan(init)(filterStep)
Functional Streams help, by handling unbounded data in memory
Read the paper: arXiv:1609.00635